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Abstract 

The intention of this paper is to estimate a Bayesian distribution-free chain ladder (DFCL) 
model using approximate Bayesian computation (ABC) methodology. We demonstrate how to 
estimate quantities of interest in claims reserving and compare the estimates to those obtained 
from classical and credibility approaches. In this context, a novel numerical procedure utilising 
Markov chain Monte Carlo (MCMC), ABC and a Bayesian bootstrap procedure was developed in 
a truly distribution-free setting. The ABC methodology arises because we work in a distribution- 
free setting in which we make no parametric assumptions, meaning we can not evaluate the 
likelihood point-wise or in this case simulate directly from the likelihood model. The use of a 
bootstrap procedure allows us to generate samples from the intractable likelihood without the 
requirement of distributional assumptions, this is crucial to the ABC framework. The developed 
methodology is used to obtain the empirical distribution of the DFCL model parameters and 
the predictive distribution of the outstanding loss liabilities conditional on the observed claims. 
We then estimate predictive Bayesian capital estimates, the Value at Risk (VaR) and the mean 
square error of prediction (MSEP). The latter is compared with the classical bootstrap and 
credibility methods. 

Key words: Claims reserving, distribution-free chain ladder, mean square error of prediction, 
Bayesian chain ladder, approximate Bayesian computation, Markov chain Monte Carlo, 
annealing, bootstrap 
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1. Motivation 



The distribution-free chain ladder model (DFCL) of Mack [14] is a popular model for stochastic 
claims reserving. In this paper we use a time series formulation of the DFCL model which allows 
for bootstrapping the claims reserves. An important aspect of this model is that it can provide 
a justification for the classical deterministic chain ladder (CL) algorithm which originally was 
not founded on an underlying stochastic model. Moreover, it allows for the study of prediction 
uncertainties. Note that there are different stochastic models that lead to the CL reserves (see for 
example Wiithrich-Merz [30], Section 3.2). In the present paper we use the DFCL formulation 
to reproduce the CL reserves. 

The paper presents a novel methodology for estimating a Bayesian DFCL model utilising a 
framework of approximate Bayesian computation (ABC) in a non-standard manner. A method- 
ology utilising Markov chain Monte Carlo (MCMC), ABC and a Bayesian bootstrap procedure 
is developed in a distribution-free setting. The ABC framework is required because we work in 
a distribution-free setting in which we make no parametric assumptions about the form of the 
likelihood. Effectively, the ABC methodology allows us to overcome the fact that we cannot 
evaluate the likelihood point-wise in the DFCL model. Typically, ABC methodology circum- 
vents likelihood evaluations by simulation from the likelihood. However, in this case simulation 
from the likelihood model is not directly available because no parametric assumption is made. 
We combine ABC methodology with bootstrap to overcome this additional complexity that the 
DFCL model presents in the ABC framework. Then, by using an MCMC numerical sampling 
algorithm combined with the novel version of ABC that has the embedded bootstrap procedure, 
we are able to obtain samples from the intractable posterior distribution of the DFCL model 
parameters. 

This allows us to utilise this methodology to obtain the Bayesian posterior distribution of the 
DFCL model parameters empirically. Then we demonstrate two approaches in which we can 
utilise the posterior samples for the DFCL model parameters to obtain the Bayesian predictive 
distribution of the claims. The first approach involves using each posterior sample to numeri- 
cally estimate the full predictive claims distribution given the observed claims. The alternative 
approach involves using the posterior samples for the DFCL model parameters to form Bayesian 
point estimators. Then, conditional on these point estimators, we can obtain the Bayesian 
conditional predictive distribution for the claims. The second approach will be relevant for 
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comparisons with the classical and credibility approaches. The first approach has the benefit 
that it integrates out of the Bayesian predictive claims distribution the parameter uncertainty 
associated with estimation of the DFCL chain ladder parameters. 

The paper then analyses the parameter estimates in the DFCL model, the associated claims 
reserves and the mean square errors of prediction (MSEP) from both the frequentist perspective 
and a contrasting Bayesian view. In doing so we analyse CL point estimators for parameters of 
the DFCL model, the resulting estimated reserves and the associated MSEP from the classical 
perspective. These include non-parametric bootstrap estimated prediction errors which can be 
obtained via one of two possible bootstrap procedures, conditional or unconditional. In this 
paper we consider the process of conditional back propagation; see [30] for in-depth discussion. 
These classical frequentist estimators are then compared to Bayesian point estimators. The 
Bayesian estimates considered are the maximum a posteriori (MAP) and the minimum mean 
square error (MMSE) estimators. For comparison with the classical frequentist reserve estimates, 
we also obtain the associated Bayesian estimated reserves conditional upon the Bayesian point 
estimators. 

In addition, since in the Bayesian setting we obtain samples from the posterior for the parameters 
we use these along with the MSEP obtained by the estimated Bayesian point estimators to obtain 
associated posterior predictive intervals to be compared with the classical bootstrap procedures. 
We then robustify the prediction of reserves by Rao-Blackwellization, that is, we integrate out 
the influence of the unknown variance parameters in the DFCL model. Having done this, we 
analyse the resultant MSEP. This is again only achievable since in the Bayesian setting we obtain 
samples from the joint posterior for the CL factors and the variances. 

To summarize our contribution, the novelty within this paper involves the development and 
comparison of a new estimation methodology to work with the Bayesian CL model for the 
DFCL model which makes no parametric assumptions on the form of the likelihood function; 
see also Gisler-Wiithrich [12]. This is unlike the works of Yao [31] and Peters et al. [21] that 
assume explicit distributions in order to construct the posterior distributions in the Bayesian 
context. Instead we demonstrate how to work directly with the intractable likelihood functions 
and the resulting intractable posterior distribution, using novel ABC methodology. In this 
regard we demonstrate that we do not need to make any parametric assumptions to perform 
posterior inference, avoiding potentially poor model assumptions made, as for example in the 
paper of Yao [31]. 
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accident 
year i 


development years j 
1 j ... 7 



1 

i 

I -1 
I 


observed random variables dj S Vj 

to be predicted dj £ 



Table 1: Claims development triangles. 

Outline of this paper. The paper begins with a presentation of the claims reserving problem 
and then presents the model we shall consider. This is followed by the description of the 
classical CL algorithm and the construction of a Bayesian model that can be used to estimate 
the parameters of the model. The Bayesian model is constructed in a distribution-free setting. 
This is followed by a discussion on classical versus Bayesian parameter estimators along with 
a bootstrap based procedure for the estimation of the parameter uncertainty in the classical 
setting. The next section presents the methodology of ABC coupled with a novel bootstrap 
based sampling procedure which will allow us to work directly with the distribution-free Bayesian 
model. We then illustrate the developed algorithm on a synthetic data set and the real data set, 
comparing performance to the classical results and those obtained via credibility theory. 

2. Claims development triangle and DFCL model 

We briefly outline the claims development triangle structure we utilise in the formulation of 
our models. Assume there is a run-off triangle containing claims development data with the 
structure given in Table 1. 

Assume that Cjj are cumulative claims with indices i G {0, . . . , 1} and j £ {0, . . . , J}, where 
i denotes the accident year and j denotes the development year (cumulative claims can refer 
to payments, claims incurred, etc). We make the simplifying assumption that the number of 
accident years is equal to the number of observed development periods, that is, I = J. At time 
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/, we have observations 

V I = {C iJ ; i+j<I}, (2.1) 
and for claims reserving at time / we need to predict the future claims 

Vi = {Cij; i+j>I,i<I,j<J}. (2.2) 

Moreover, we define the set Bj = {Cj^; i + k < I, < k < j} for j G {0, . . . , /}, that is, £>o is 
the first column in Table 1. 

2.1. Classical chain ladder algorithm 

In the classical (deterministic) chain ladder algorithm there is no underlying stochastic model. 
It is rather a recursive algorithm that is used to estimate the claims reserves and which has 
proved to give good practical results. It simply involves the following recursive steps to predict 
unobserved cumulative claims in Vj. Set Cij-i = Cij-i and for j > I — i 

Cij = Cij^fjFp with CL factor estimates ff^ ] = ^ffl CiJ . (2.3) 

Since this is a deterministic algorithm it does not allow for quantification of the uncertainty 
associated with the predicted reserves. To analyse the associated uncertainty there are several 
stochastic models that reproduce the CL reserves; for example Mack's distribution-free chain 
ladder model [14], the over-dispersed Poisson model (see England- Verrall [6]) or the Bayesian 
chain ladder model (see Gisler-Wiithrich [12]). We use a time series formulation of the Bayesian 
chain ladder model in order to use bootstrap methods and Bayesian inference. 

2.2. Bayesian DFCL model 

We use an additive time series version of the Bayes chain ladder model (Model Assumptions 3.1 
in Gisler-Wiithrich [12]). 

Model Assumptions 2.1. 

1. We define the CL factors by F = (Fq, . . . ,Fj-±) and the standard deviation parameters 
by S = (Ho, . . . , We assume independence between all these parameters, i.e. the 

prior density of (F, S) is given by 

J-i 

7r(/,<r) = n*(/iM^). (2.4) 

3=0 

where ir{fj) denotes the density of Fj and ir(o-j) denotes the density of Ej. 
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2. Conditionally, given F = f = (/ , . . . , fj-i) and 3 = a = (<j , • • • , we have: 

• Cumulative claims Cij in different accident years i are independent. 

• Cumulative claims satisfy the following time series representation 



Cjj+i — fjCij + (Jj^CijEij+i, (2.5) 
where conditionally, given Bo, we have that the residuals Eij are i.i.d. satisfying 

E[eij\Bo,F,S]=0 and Var [e i: j\B , F, S] = 1, (2.6) 
and P [Qj > 0| B , F, 3] = 1 for all i, j. 

Remark. Note that the assumptions on the residuals are slightly involved in order to guarantee 
that cumulative claims Cjj are positive P-a.s. 

Corollary 2.2. Under Model Assumptions 2.1 we have that conditionally, givenVj, the random 
variables (Fq, So), . . . , (Pj-i, Sj-i) are independent. Thus, we obtain the following posterior 
distribution for (P,3) ; given T>j, 

J-i 

ir(f,<r\V I ) = l[Tr(f j ,a j \Vj). (2.7) 

j=0 

This result follows from Theorem 3.2 in Gisler-Wiithrich [12]; from prior independence of the 
parameters; and the fact that Cjj+i only depends on Fj, Ej and Cij (Markov property). This 
has important implications for the ABC sampling algorithm developed below. 

In order to perform the Bayesian analysis we make explicit assumptions on the prior distributions 
of (P,3). 

Model Assumptions 2.3. 

In addition to Model Assumptions 2.1 we assume that the prior model for all parameters j G 
{0, . . . , J — 1} is given by: 

• Fj ~ T (ctj,f3j), where T (ctj,f3j) is a gamma distribution with mean E [Fj] = ctjfij = fj ^ 
(see (2.3)) and large variance to have diffuse priors. 

• The variances r?- ~ IG (aj,bj), where IG(a>j,bj) is an inverse gamma distribution with 



mean E 



bj/(dj — 1) = af- CL ^ (see (3.1) below) and large variance. 
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Remarks 

1. The likelihood model is intractable, meaning that no density can be written down an- 
alytically in the DFCL model. In formulating the Bayesian model we have only made 
distributional assumptions on the priors for the parameters (F, S) but not on the observ- 
able cumulative claims Cij. Though we make distributional assumptions for the priors, the 
model is distribution-free because no distributional assumptions on the cumulative claims 
are made. As a result of only making assumptions on the priors, a standard Bayesian 
analysis using analytic posterior distributions cannot be performed. One way out of this 
dilemma would be to re-formulate the Bayesian model by making distributional assump- 
tions (for example, this is done in Yao [31]) but then the model is no longer distribution- 
free. Another approach would be to use credibility methods (see Gisler-Wiithrich [12]) 
but this only gives statements for the first two moments. In the present set up we develop 
ABC methods that allow for a full distributional answer for the posterior distributions 
without making explicit distributional assumptions for the cumulative claims Cij. 

2. Our priors are chosen as diffuse priors with large variances. This again highlights the 
differences between specification of the prior distributions and making distributional as- 
sumptions for the actual likelihood model, these are mutually exclusive ideas. 

3. We select the priors to ensure that we maintain several relevant aspects of the DFCL 
model. In particular, it is important to utilise priors that enforce the strict positivity of 
the parameters fj,<Tj > 0. We note here that the parametric Bayesian model developed 
in Yao [31] failed in this aspect when it came to prior specification. Therefore we develop 
an alternative prior structure that satisfies these required properties of the DFCL model. 

3. DFCL model parameter estimators 

This section considers both classical and Bayesian estimators for the chain ladder framework, 
including both the chain ladder factors and the variance parameters. 

3.1. Classical 

In the classical CL method, the CL factors are estimated by fj CL ^ given in (2.3). The variance 
parameters are estimated by 
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see (3.4) in Wiithrich-Merz [30]. 

Note that this estimator is only well-defined for j < I — 1. There is a vast literature and 
discussion on the estimation of tail parameters. We do not enter this discussion here but we 
simply choose the estimator given in Mack [14] for the last variance parameter which is defined 
by 

r n 4( - CL) 1 
= mm j Z27czy a J-s ' a J-2 j • ( 3 - 2 ) 

3.2. Bayesian 

In a Bayesian inference context one calculates the posterior distribution of the parameters, 
given Vj. As in (2.7) we denote this posterior by tt (/, <t\Vj). Since the MCMC-ABC bootstrap 
procedure will allow us to obtain samples from the posterior distribution of the Bayesian DFCL 
model presented, we can now consider estimating CL point estimators using these samples. 

There are two commonly used point estimators in Bayesian analysis that correspond to the 
posterior mode (MAP) and the posterior mean (MMSE), respectively: 

(fj MAP) ,°j MAP) ) = argmax /jVTj tt (/,-, a^P/) , (3.3) 

and 

jf' M5E) = J f j ir(f j \I> I )df j = E{F j \V r ], (34) 



~.(MMSE) 



In the case in which fj is not independent of <jj, the MAP estimators obtained through joint 
maximization are optimal. However, in practice one often works with marginal estimators for 
simplicity. Additionally, note that for diffuse priors we find (see Corollary 5.1 in Gisler-Wiithrich 
[12]) 

J(MMSE) ^J[CL)_ (3 g) 

Hence, using Corollary 2.2, we obtain the approximation 

Eid^Vj] = E[E[C i ,j\V I ,F,S]\V I ] = C i , I - i E 

= Cij-i if E^Vj] = Cij-i if jj MMSE) (3.7) 

j=I -i j=I-i 

J-1 

~ Cij-i J~| fj = Ci t j, 

j=I -i 



J-1 






T>i 


j=I -i 





where on the last line we have an equality if the diffusivity of the priors ir(fj) tends to infinity. 
This is exactly the argument why the Bayesian CL model can be used to justify the CL predictors; 
see Gisler-Wiithrich [12]. 



3.3. Full predictive distribution and VaR 

In addition, the posterior samples for the DFCL model parameters, obtained via the MCMC- 
ABC bootstrap procedure, will allow us to obtain the predictive distribution of the claims in 
two ways. The first is the full predictive distribution of the claims obtained after integrating out 
the posterior uncertainty associated with the Bayesian DFCL model parameters to empirically 
estimate 

n{V c I \V I )=J J ir(V c I \f,<T)Tr(f,a\V I )dfd<T. (3.8) 

In practice, this numerical procedure involves taking each posterior sample for the DFCL model 
parameters and obtaining an estimate of the predicted claims. 

The second approach involves using one of the Bayesian point estimators for the parameters 
such as the MMSE to obtain n (j) c I \f MMSE ,a MMSE ^j. Alternatively, one may consider a Rao- 
Blackwellised version of the Bayesian predictive distribution of claims involving 

vr {Vr\f MMSE ,Vj) = f ^{Vj\f MMSE ,a) vr (a\f MMSE , V T ) da 

having numerically integrated out the Bayesian posterior uncertainty associated with the DFCL 
variance parameters. Such methods are typically known as empirical Bayesian approaches. 

These results can then be applied to estimate any risk measures. For example, if we fix a security 
level 95% we can calculate the VaR on that level, which is defined by 



VaR a9 5 [Cij-EidjlVr] 



T>i = min < x; P 



Ci,j-E[C itJ \Di]>x 



< 0.05 } . (3.9) 



4. Bootstrap and mean square error of prediction 

Assume that we have calculated the Bayesian predictor or the CL predictor given in (3.7). 
Then we would like to determine the prediction uncertainty, that is, we would like to study 
the deviation of C^j around its predictor. If one is only interested in second moments, the 
so-called conditional mean square error of prediction (MSEP), one can often estimate the error 
terms analytically. However, other uncertainty measures like Value-at-Risk (VaR) can only be 
determined numerically; see (3.9). 
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A popular numerical method is the bootstrap method. The bootstrap technique was developed 
by Efron [3] and extended by Efron-Tibshirani [4] and Davison-Hinkley [1]. In the actuarial 
literature the development of bootstrap procedures includes the work of Taylor [27], Taylor- 
McGuire [28], [29], England- Verrall [5], [7] and Pinheiro et al. [19]. 

This procedure allows one to obtain information regarding an aggregated distribution given 
a single realisation of the data. To apply the bootstrap procedure one introduces a minimal 
amount of model structure such that resampling observations can be achieved using observed 
samples of the data. 

In this section we present a bootstrap algorithm in the classical frequentist approach. That is, 
we assume that the CL factors F = f and the standard deviation parameters S = a given in 
Model Assumptions 2.1 are unknown constants. The bootstrap then generates synthetic data 
denoted by V\ that allow for the study of the fluctuations of f^ cv> and er 2 ^) (for details see 
Section 7.4 in Wiithrich-Merz [30]). In the presented text we restrict ourselves to the conditional 
resampling approach presented in Section 7.4.2 of Wiithrich-Merz [30]. 



4-1. Non-parametric classical bootstrap (conditional version) 

1. Calculate estimated residuals £ij for i + j < I, j > 0, conditional on the estimators /q.j-i 
and CTq.j^i and the observed data T>r. 



f(CL) ~(CL)^ _ ^hJ Jj-1 

a j-i V°m-i 



2. These residuals {£i,j)i+j<i give the empirical bootstrap distribution Fd i . 

3. Sample i.i.d. residuals e* • ~ Fd i for i + j < I, j > 0. 

4. Generate bootstrap observations (conditional resampling) 



°M ~~ Jj-1 u »,j-l"r a j-l V U hj-l £ i,j> 

which defines P| = T>* I {jF ( -' L \a , ^ !L ^). Note that for the unconditional version of boot- 
strap we should generate C*j = fj < li^C*j_ 1 + \j^t,j - 1 ' ^ or a discussion on this 
approach, see Section 7.4.1 of [30]. 
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5. Calculate bootstrapped CL parameters /*and <rj* by 

i=Q 



-2* _ 1 ST 1 -!- 1 r ( C jj+l 7^' 



6. Repeat steps 3-5 and obtain empirical distributions from the bootstrap samples C*j, f* 
and <rj*. These are then used to quantify the parameter estimation uncertainty. 



This non-parametric classical bootstrap method can be seen as a frequentist approach. This 
means that we do not express our parameter uncertainty by the choice of an appropriate prior 
distribution. We rather use a point estimator for the unknown parameters and then study the 
possible fluctuations of this point estimator. 

The main difficulty now is that the non-parametric bootstrap method, as described above, 
underestimates the "true" uncertainty. This comes from the fact that the estimated residuals £j j , 
in general, have variance smaller than 1 (see formula (7.23) in Wiithrich-Merz [30]). This means 
that our estimated residuals are not appropriately scaled. Therefore, frequentists use several 
different scalings to correct this fact (see formula (7.24) in Wiithrich-Merz [30] or England- Verrall 
[6]). Here, we use a different approach by introducing the novel Bayesian bootstrap method 
embedded within an MCMC-ABC algorithm to obtain empirically the posterior distribution of 
the Bayesian DFCL model, described below. Having obtained this, we can then calculate all 
required Bayesian parameter estimates, capital reserve estimates and associated risk measures 
such as VaR. Before presenting the methodology for this novel MCMC-ABC algorithm we will 
finalize this section with the decompositions of the MSEP under frequentist, Bayesian and 
credibility approaches. 

4-2. Frequentist bootstrap estimates 

Let us for the time-being concentrate on the conditional MSEP given by 

2 



Ci,j - C i: j 



(4.1) 



= Var(C iiJ |2?j) + ( J E;[C ii j|2?j]-C l ii j)' 
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The first term is known as the conditional process variance and the second term as the parameter 
estimation uncertainty. In the frequentist approach (i.e. for given deterministic F = f and 
S = a) these terms can be calculated as 



Var (Ci,j\ Vj) = [E [d,j\ C^]) 2 £ 



def. 



jii E i C i,j I C i,I-i] 



Cij-iT i-i, 



3=1-1 



and 



see Wiithrich-Merz [30], Section 3.2. 



J-i 



J-i 



G 2 



n n- n /: 



ij—i I 11 JJ 11 Jj 
j=I-i j=I 



%CL) def. 2 a . 



The process variance (4.2) is estimated by replacing the parameters by its estimators, 

j-l ~2(CL) ,,7(CL),2 



\^(c h j\v I ) = (c hJ y Y, ' 

j=I -i 



def. 



freq 
i ' 



h3 



(4.2) 



(4.3) 



(4.4) 



The parameter estimation error is more involved and there we need the bootstrap algorithm. As- 
sume that the bootstrap method gives T bootstrap samples f*^\ . . . , f*^ T \ Then the parameter 
estimation error (4.3) is estimated by the sample variance of the product of the bootstrap obser- 
vation chain ladder parameter estimates f*^\ . . . , f*^ T \ which gives the estimator Cf I _ i A^ I r ^. 



4-3. Bayesian estimates 

In the Bayesian setup, (i.e. choosing prior distributions for the unknown parameters F and 3) 
we obtain a natural decomposition of the conditional MSEP: 

msep c . J p J (S[C i)J |I> / ]) = Var(C i , J |P 7 ) (4.5) 

= E [Var (C it j\ Vj, F, S)| Vj] + Var (E [d,j\ Vj, F, S]| Pj) . 

The average process variance is given by (see Wiithrich-Merz [30], Lemma 3.6) 



J-i 



J-i 



n F m n 

m=I—i n=j+l 



E[Vax(C i ,j\Vi,F,E)\V I \=C i , I - i £ E 

j=I-i 

= Ci,i-iJ2 I] E^V^E^Vj] H E^V,]'**- Cv-ift 

j=I—im=I—i n=j+l 



(4.6) 



Bayes 



where we have used posterior independence (2.7). The parameter estimation error is given by 

J-i 



Var (E [Cij\ Vj, F, S]| Vj) = Var }J Fj 



j=I-i 



V: | de J- Cfj_Afr S , (4-7) 
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where we have used (3.7). Using (2.7), we obtain for the last term 



(~i2 a Bayes _ ^2 



J-1 J-1 

n e [ f "\ v i] - n e^v^ 

3=1 -i j=I-i 



(4.8) 



In order to calculate these two terms given in (4.6) and (4.8), we need to calculate the posterior 
distribution of (F,S), given Vj. Since we do not have a full distributional model, we cannot 
write down the likelihood function, which would allow for analytical solutions or Markov chain 
Monte Carlo (MCMC) simulations. Therefore we introduce the ABC framework which allows for 
distribution-free simulations using appropriate bootstrap samples and a distance metric. This 
will be discussed in Section 5. 



4-4- Credibility Estimates 

As mentioned previously, we can also consider the credibility estimates given in Gisler-Wuthrich 
[12]. As long as we are only interested in the second moments (i.e. conditional MSEP) we can 
also use credibility estimators, which are minimum variance estimators that are linear in the 
observations. For diffuse priors we obtain the approximation given in Corollary 7.2 of Gisler- 
Wiithrich [12] 

i^Pc^v, (E [C itJ \ Vj]) = Cij-iff^f + Clj_Afli , (4.9) 



where 



J-i 

jeered \ 

1 I-i ~ 

j=I-i 



j-1 J-1 / ~2(CL) \ , 

n /t i, ?? ci) n (g ot> )'+ r'n-tr )\- (4 - io) 

m=I-i n=j+l V Z^j=0 



J-1 / -2(CL) \ J-l 

Af_f = n ($ cL) ) 2 + Jij-i r - n(/T L) ) 2 - 

j=I-i V l^i=0 ^hj J j=I-i 

In the results section we compare the frequentist bootstrap approach, the credibility approach 
and the ABC bootstrap approach that is described below (see Table 7 below). 



5. ABC for intractable likelihoods and numerical Markov chain sampler 

To estimate numerically the parameters, predicted claims and associated uncertainty measures 
such as the MSEP presented in the previous sections, the Bayesian approach requires the ability 
to sample from the posterior distribution of the DFCL model parameters. Obtaining samples 
, (T 2 W} t=1 . T which are realisations of a random vector distributed with a posterior distri- 
bution -K(f,a\Vj) in the DFCL model is difficult since the likelihood is intractable. Hence, 
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standard numerical approaches such as Markov chain Monte Carlo (MCMC) algorithms (see 
Gilks et al. [11]) cannot be directly used since they all require explicit repeated evaluation of the 
likelihood function at each stage of the Markov chain sampling algorithm. It is common to avoid 
this difficulty by making distributional assumptions for the form of the likelihood. This then 
violates the DFCL model assumption but allows for relatively standard sampling procedures to 
be applied. In this regard, one possible approach involves making a specific Gaussian assump- 
tion for the likelihood. One problem with this assumption, which is evident immediately, is that 
it precludes skewness in the model. Here, we do not make any such assumptions and instead 
we work in a truly distribution-free model using ABC to facilitate sampling from an intractable 
posterior distribution. 

There is an additional complexity in the DFCL model not typically encountered when working 
with ABC methodology. Typically, ABC methodology is developed in the case in which the 
model likelihood cannot be evaluated point-wise, but conditional on parameter values, synthetic 
data is easily simulated from the model; see examples in Peters-Sisson [16] and Peters et al. [22]. 
This is not the case in the Bayesian DFCL model. Under the DFCL model the likelihood is 
only expressed by moment conditions, hence we cannot evaluate the likelihood point-wise and 
also the simulation from the likelihood cannot be performed directly. This is why we introduce 
the novel concept of the Bayesian bootstrap which is embedded within the ABC methodological 
framework. 

Hence, to sample from the posterior in our DFCL model we develop a novel formulation of 
the ABC methodology based on the bootstrap and conditional back transformation procedure, 
similar to that discussed in Section 4. 

ABC methods aim to sample from posterior distributions in the presence of computationally 
intractable likelihood functions. For an application in risk modelling of ABC methodology, see 
Peters-Sisson [16]. In this article we present a novel MCMC- ABC algorithm. Before presenting 
some details of the numerical MCMC procedure, we note that alternative numerical algorithms 
could be considered in the ABC context. For example, a sequential Monte Carlo (SMC) based 
algorithms which can improve simulation efficiency can be found in Del Moral et al. [2], Sisson 
et al. [25], Peters et al. [17], [18] and Marjoram et al. [15]. 
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5.1. ABC methodology 

In this section we provide a brief description of ABC methodology, which describes a suite of 
methods developed specifically for working with models in which the likelihood is computation- 
ally intractable. Here we work with a Bayesian model and consider the likelihood intractability 
to arise in the sense that we may not evaluate the likelihood point-wise. 

The ABC method we consider here embeds an intractable target posterior distribution, in our 
case denoted by tt (/, a\Vj), into a general augmented model 

tt (/, a, V\,Vi) = 7T {V^V}, /, a) tt (V*j\f, a) tt (/, a) , (5.1) 

where £>| is an auxiliary vector on the same space as Vj. In this augmented Bayesian model, the 
weighting function tt (T>j\DJ, f , er) weights the intractable posterior. In this paper we consider 
the hierarchical model assumption, where we work with tt (V[\Vj, f,cr)=g (Vj\Vj); see Reeves 
and Pettitt [24]. 

The mechanism in the ABC framework which allows one to avoid the evaluation of the intractable 
likelihood involves replacing this evaluation with data simulation from the likelihood. That is, 
given a realisation of the parameters of the model, a synthetic data set T>j is generated and 
compared to the original data set. This is a key aspect of the novel methodology we develop 
in this paper, since we utilise a bootstrap procedure to perform this simulation in the DFCL 
model setting. 

Then summary statistics S(Vj) derived from this data are compared to summary statistics of the 
observed data S(T>j) and a distance p (S(T>J), S(T>j)) is calculated. Finally, a weight is given to 
these parameters according to the weighting function g (T>i\Dj), which may give greater weight 
when S(V*j) and S{Vi) are close (i.e. where p{S{V* I ),S{V I )) is small). 

For example, under the "Hard Decision" (HD) weighting given by 

giVjlVfiocl (5.2) 
otherwise; 

v 

a reward is given to summary statistics of the augmented auxiliary variables S (VJ) within an 
e-tolerance of the summary statistic of the actual observed data S (Vj), as measured by distance 
metric p. 

Hence, in the ABC context, an approximation to the intractable target posterior marginal 
distribution tt (/, a\Vj), for which we are interested in formulating an empirical estimate, is 
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given by 

^ ABC (/, <r\Vi, e) oc J g (Vj\V}) vr (V}\f, a) tt (/, <x) dV*j. (5.3) 

As briefly mentioned, obtaining samples from the ABC posterior can be achieved using a num- 
ber of numerical procedures, in this paper we consider an MCMC approach. The MCMC class 
of likelihood-free algorithm is justified on a joint space formulation, in which the stationary 
distribution of the Markov chain is given by ttabc (f, (T ,^\'^ > i,^)- The corresponding target 
distribution for the marginal distribution ttabc (f, cr|X>/, e) is then obtained via numerical inte- 
gration. Note that the marginal posterior distribution ttabc {f, f|X>i, e) — > tt (/, cr\Vj) as e — > 0, 
recovering the "true" (intractable) posterior, assuming that S (T>j) are sufficient statistics and 
that the weighting function converges to a point mass on S (T>j) as e — > 0; see Peters-Sisson [16] 
and references therein for detailed discussion. Accordingly, the tolerance e is typically set as low 
as possible for a given computational budget. In this paper we focus on the class of MCMC-based 
sampling algorithms. 

The ABC methodology is novel both in the statistics literature and in the actuarial literature. 
It is informative to clearly provide the justification for this approach both theoretically and nu- 
merically. The simplest understanding of ABC is achieved by considering a rejection algorithm, 
therefore we provide a basic argument for how the ABC methodology works in simple rejection 
sampling in Appendix A. The actuarial DFCL model considered in this paper requires the more 
sophisticated MCMC- ABC methodology described below. 

5.2. Technical justification for MCMC- ABC algorithm 

For given observations T>j we want to sample from TTABc{fi <J \^ > i) with an intractable likelihood 
function. We assume that S(Vj) is either the data itself or a summary of the data such as a 
sufficient statistic for the model from which we assume data Vj is a realisation. We assume 
that, given a set of parameters values (/,<x), we can generate from the DFCL model (via a 
conditional bootstrap procedure) a synthetic data set denoted V* r We define a hard decision 
function #(£>},£>/) = I{p(S(PJ), £(£>/)) < e}(V}) for a given tolerance level e > and a 
distance metric p(-, •), where !{•} is the indicator function which equals 1 if the event is true and 
otherwise. As demonstrated in Appendix A, we use the approximation, (A.3)-(A.4), which 
gives us in the Bayesian DFCL model setting, 

JgiVjlV}) *(V*j\f,*) 7t(/,<t) dV* _ 7r(f,a)E [g(V >|2?J)| / <r] 



KABc(f,<r\'Di,e) 



JgiVjlV}) ir(V*\f,a) 7r(/,<r) dV* df da E [giV^V*)] 

(5.4) 
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In the next step the numerator of (5.4) is approximated using the empirical distribution: 

1 L 

tt(/, <t)E \g(Vi\V})\ f, a] « n(f, a^^g (v^vf^f, <r)) , (5.5) 

i=i 

where T>^ l \f,a) n(T)j\f, <r). Finally, we need to consider the denominator E [g(X\y)]. In 
general this has a non-trivial form that cannot be calculated analytically. However, since we use 
an MCMC based method the denominators cancel in the accept-reject stage of the algorithm. 
Therefore, the intractability of the denominator does not impede sampling from the posterior. 
Thus we use 

ABC{Tl ' U } ~ Jg(V I \V* I )ir(V* I \f,<r)ir(f,*)dV* I dfd* 

^7T(f,a)E[g(V I \V* I )\f, ( r} ( 5 . 6 ) 

«7r(/M < T)i^»(2? 7 |2?;'«(/ >< r)) 

l=i 

in order to obtain samples from KABcif-, a \Pi, e)- Almost universally, L = 1 is adopted to 
reduce computation but on the other hand this will slow down the rate of convergence to the 
stationary distribution. 

Note that sometimes one also uses softer decision functions for g(-\-). The role of the distance 
measure p is evaluated by Peters et al. [22]. We further extend this analysis to the class of 
models considered in this paper. We analyse several choices for the distance measure p such as 
Mahlanobis distance, scaled Euclidean distance and the Manhattan "City Block" distance. Fan 
et al. [8] demonstrate that it is not efficient to utilise the standard Euclidean distance, especially 
when summary statistics considered are on different scales. 

Additionally, using an MCMC- ABC algorithm, it is important to assess convergence diagnostics. 
Particularly when using MCMC-ABC where serial correlation in the Markov chain samples 
can be significant if the sampler is not designed carefully. We assess autocorrelation of the 
simulated Markov chain, the Geweke [10] time series statistic and the Gelman-Rubin [9] R- 
statistic convergence diagnostic in an ABC setting. 

Concluding: We apply three different techniques in order to treat the intractable likelihood: 

1. ABC is used to get a handle on the likelihood and therefore the intractable posterior. 

2. As a result of using ABC we need to be able to generate synthetic data samples from 
the DFCL model given realisations of the parameters. These data samples come from the 
bootstrap algorithm. 
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3. We use a well understood MCMC based sampling algorithm that does not require calcula- 
tion of the non-analytic normalizing constants for the target distribution iTABcifi ff\Di-> e)- 
The reason for this is that in the acceptance probability of the MCMC algorithm, the nor- 
malizing constant for the target posterior appears both in the numerator and denominator, 
resulting in cancellation. 

The specific details of the MCMC algorithm and ABC choices are provided in the Appendix B. 

6. Example 1: Analysis of MCMC- ABC bootstrap methodology on synthetic data 

To test the accuracy of the methodology, first we use synthetic data generated with known 
parameter values. The tuning of the proposal distribution in this study is done for the simplest 
"base" distance metric, the weighted Euclidean distance. To study the effect of the distance 
metric in a comparative fashion we shall keep the proposal distribution unchanged. 

The first example we present has a claims triangle of size I = J = 9. In this example we fix the 
true model parameters, denoted by / = (/o, . . . , fj-i) and er 2 = (<Tq, . . . , and given in 

Table 2, used to generate the synthetic data set. 

6.1. Generation of synthetic data 

To generate the synthetic observations for Vj, we generate randomly the first column (i.e. Bo). 
Then conditional on this realisation of Bq we make use of the model given in (2.1) to generate the 
remaining columns of T>i, ensuring the model assumptions are satisfied. This requires setting 
Cifi sufficiently large (for appropriate choices of / and a 2 ) and then sampling i.i.d. realisations 
of Eij ~ U [— \/3, a/3] used to obtain Vj; see the observations in Table 2. 

6.2. Sensitivity analysis and convergence assessment 

We perform a sensitivity analysis, studying the impact of the distance metric on the mixing of 
the Markov chain in the case of joint estimation of the chain ladder factors and the variance 
parameters. 

The pre-tuned coefficient of variation of the Gamma proposal distribution for each parameter of 
the posterior was performed using the following settings; Tb = 50, 000, T = 200, 000, e mm = 0.1 
and initial values jj = 1 for all j G {1, . . . , 2J}. Additionally, the prior parameters for the chain 
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ladder factors Fj were set as (a,/3) = (2, 1.2/2) and the parameters for the variance parameters 
EJ 2 were set as (a, b) = (2, 1/2). 

After tuning the proposal distributions during burn-in and rounding the shape parameters, we 
found that jj = 10 for all j £ {1, . . . , 2 J} produced average acceptance probabilities for each 
parameter between 0.3 and 0.5. This is a range typically used in practice when designing MCMC 
sampling algorithms. 

Then, keeping the proposal distribution constant and using a common data set T>i, we ran three 
versions of the MCMC- ABC algorithm for 200,000 samples corresponding to: 

1. scaled Euclidean distance and joint estimation of posterior for F, H 2 ; 

2. Mahlanobis distance (modified) and joint estimation of posterior for F,S 2 ; and 

3. Manhattan "City Block" distance and joint estimation of posterior for F,H 2 . 

6.3. Convergence diagnostics 

We estimate the three convergence diagnostics given in Appendix B. The results of this analysis 
are presented as a function of Markov chain iteration t post burn-in of 50,000 samples. 

Autocorrelation Function: Figure 1 shows the estimated autocorrelation functions for the 
Markov chains of the random variables Fo and Hq. We analyze the marginal parameters to 
get a reasonable estimate of the mixing behavior of the MCMC-ABC algorithm. The results 
demonstrate the degree of serial correlation in the Markov chains generated for these parameters 
as a function of lag time r. The higher the decay rate in the tail of the estimated ACF as 
a function of r, the better the mixing of the MCMC algorithm. Due to the independence 
properties of this model there is little difference between results obtained for Scaled Euclidean 
and Mahlanobis distances. As shown in Appendix C, the estimate of the covariance matrix 
is diagonal on all but the right lower 2x2 block. Hence, we recommend using the simple 
Scaled Euclidean distance metric as it provided the best trade-off between simplicity and mixing 
performance. 

Geweke Time Series Diagnostic: Figure 2 shows results for the Geweke time series diag- 
nostic. Again, we present the results for the random variables Fo and Hq. Note, we used the 
posterior mean as the sample function and a set of increasing values for T from + 5, 000 
increasing in steps of 5,000 samples to T. In each case we split the chain in each "window" given 
by {0^}t=i:Ti an d {Qf^} t=T *.f accor ding to recommendations from Geweke et al. [10]. We then 
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calculate the convergence diagnostic Zj, which is the difference between these two means divided 
by the asymptotic standard error of their difference. As the chain length increases T — > oo, the 
sampling distribution of Z — > J\f(0, 1) if the chain has converged. Hence values of Zf in the tails 
of a standard normal distribution suggest that the chain was not fully converged early on (i.e. 
during the 1st window). Hence, we plot Zj, scores versus increasing T and monitor if they lie 
within a 95% confidence interval Zj> 6 [—1.96, 1.96]. The results in Figure 2 clearly demonstrate 
the convergence properties of the distance functions differ. Again this is more material in the 
Markov chain for the variance parameter when compared to the Markov chain results for the 
chain ladder factor. The main point we note is that again one would advise against use of the 
"City block" distance metric. 

Gelman and Rubin R statistic: Figure 3 presents the Gelman and Rubin convergence 
diagnostic. To calculate this we ran 20 chains in parallel, each of length 10,000 samples and for 
each chain we discarded 250 samples as burn-in. We then estimated the R statistic as a function 
of simulation time post burn-in. Figure 3 shows the convergence rate of the R statistic to 1 for 
each distance metric on increasing blocks of 200 samples. Using this summary statistic, all three 
distance metrics are very similar in terms of convergence rate of the R statistic to 1. 

Overall, these three convergence diagnostics demonstrate that the simple scaled Euclidean dis- 
tance metric is the superior choice. Secondly, we see appropriate convergence of the Markov 
chains under three convergence diagnostics which tests different aspects of the mixing of the 
Markov chains, giving confidence in the performance of the MCMC-ABC algorithm for this 
model. 



6.4- Bayesian parameter estimates 

In this section we present results for the scaled Euclidean distance metric, with a Markov chain 
of length 200,000 samples discarding the first 50,000 samples as burn-in. Table 4 shows the CL 
parameter estimates for the DFCL model and the associated parameter estimation error. We 
define the following quantities: 

• fj MAP) Wo:j-i, fj MMSE ^Wo-.j-i, dfjlao-.j-i and [<?o.05, <?o.95]ko : J-i denote respectively the 
Maximum a-Posteriori, Minimum Mean Square Error, posterior standard deviation of the 
conditional distribution of chain ladder factor Fj and the posterior coverage probability 
estimates at 5% of the conditional distribution of chain ladder factor Fj. Each of these 
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estimates is conditional on knowledge of the true gq-.j-i. 

fj MAP \ Jj MMSE ^ anc j [g 005) qo.95] denote the same quantities for the unconditional 
distribution after joint estimation of Fq-j-\ and So : j-i. 



• Ave[A (0\:2j, fj)] and Ave[A (9\ : 2,j, <Jj)] denote the average acceptance probabilities of the 
Markov chain. 

• g^(MAP)^ ^2(mmse) ^ [go.05) 90.95] denote the same quantities for the chain ladder 
variances as those defined above for chain ladder factors. 

Note, the estimates for f( MAP ) anc [ ^( MAP ) were obtained marginally. For the frequentist 
approach we obtain the standard error in the estimates by using 1,000 bootstrap realisations 
of |Pr S H to obtain { $ C \ CL \ & 2 S? CL ^\ . We use these bootstrap samples to 

I 1 J S =1:1,000 l J W ' M J S =1:1,000 F F 

calculate the standard deviation in the estimates of the parameters in the classical frequentist 
CL approach, given in brackets (.) next to their corresponding estimators. The standard errors 
in the Bayesian parameter estimates are obtained by blocking the Markov chain into 100 blocks 
of length 1,500 samples and estimating the posterior quantities on each block. 



7. Example 2: Real Claims Reserving data 



In this example we consider estimation using real claims reserving data from Wiithrich-Merz 
[30], see Table 3. This yearly loss data is turned into annual cumulative claims and divided by 
10,000 for the analysis in this example. We use the analysis from the previous study to justify 
use of the joint MCMC-ABC simulation algorithm with a scaled Euclidean distance metric. 

We pre-tuned the coefficient of variation of the Gamma proposal distribution for each parameter 
of the posterior. This was performed using the following settings: T5 = 50, 000, T = 200, 000, 
e mm = 10~ 5 and initial values jj = 1 for all j £ {1, . . . , 2 J}. Here we make a strict requirement of 
the tolerance level to ensure we have accurate results from our ABC approximation. Additionally, 
the prior parameters for the chain ladder factors Fj were set as (aj,/3j) = (^,fj CL ^ and the 
parameters for the variance EJ 2 priors were set as (aj,bj) = (l,dj CL ^. The code for this 
problem was written in Matlab and it took approximately 10 min to simulate 200,000 samples 
from the MCMC-ABC algorithm on Intel Xeon 3.4GHz processor with 2Gb RAM. 

After tuning the proposal distributions during burn-in we obtained rounded shape parameters 
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7l:9 = [50; 100; 500; 500; 5, 000; 20, 000; 100, 000; 2, 000, 000; 3, 000, 000] provided average accep- 
tance probabilities between 0.3 and 0.5. 

Estimates of / and er 

Figures 4 presents box-whisker plots of estimates of the distributions of the parameters Fq : j-\ 
and Sq: j_i obtained from the MCMC-ABC algorithm, post burn-in. Figure 5 shows the Bayesian 
MCMC-ABC empirical distributions of the ultimate claims, C^j for i = 1, . . . , I. In Table 5 
we present the predicted cumulative claims for each year along with the estimates for the chain 
ladder factors and chain ladder variances under both the classical approach and the Bayesian 
model. We see that with this fairly vague prior specified, we do indeed obtain convergence 
of the MCMC-ABC based Bayesian estimates f( MMSE ) ^(mmse) 

to the classical estimates 

f{CL)^{CL) m 

Dependence on tolerance e 

Figure 6 presents a study of the histogram estimate of the marginal posterior distribution for 
chain ladder factor ttabc (M^i, e mm ) . The plot was obtained by sampling from the full posterior 
it ABC (/) <T I^ ) /) e min ) for each specified tolerance value, e min . Then the samples for the particular 
chain ladder parameter in each plot are turned into a smoothed histogram estimate for each 
e mm and plotted. The results of this analysis demonstrated that when e is large, in this model 
greater than around e min = 0.1, the likelihood is not having an influence on the ABC posterior 
distribution. Hence, under an MCMC-ABC algorithm, this results in acceptance probabilities 
for the chain being artificially high, resulting in estimates of the posterior which reflect the prior 
distribution used (in this case a vague prior). As e mm is reduced, we notice that the changes in 
the estimate of the posterior distribution also reduces. The aim of this study is to demonstrate 
that once e mm reaches a small enough level, the effect of reducing it further is minimal on the 
posterior distribution. We see that changing e mm from 10 -4 to 10~ 5 has not had a material 
impact on the posterior mean or variance, the change is less than 10%. As a result, reducing 
e mm past this point cannot be justified relative to the significant increase in computational effort 
required to achieve such a further reduction in e min . 

Ultimately, we would like an algorithm which could work well for any e mm y the smaller the better. 
However, we note that with a decreasing e min in the sampler we present in this paper, one must 
take additional care to ensure the Markov chain is still mixing and not "stuck" in a particular 
state, as is observed to be the case in all MCMC-ABC algorithms. To avoid this acknowledged 
difficulty with MCMC-ABC, one should run much longer MCMC chains or alternatively use of 
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more sophisticated sampling algorithms such as SMC Samplers PRC- ABC based algorithms; see 
Sisson et al. [25]. 

The conclusion of these findings is that a value of e min = 10 -5 , which was used for the analysis 
of the data in this paper, is suitable numerically and computationally 



VaR and MSEP. 

In Table 6 we present the predictive VaR at 95% and 99% levels for the ultimate predicted claims, 
obtained from the MCMC-ABC algorithm. These are easily obtained under the Bayesian setting, 
using the MCMC-ABC posterior samples to explicitly obtain samples from the full predictive 
distribution of the cumulative claims after integrating out the parameter uncertainty numeri- 
cally. In addition to this, we present the analysis of the MSEP under the bootstrap frequentist 
procedure and the Bayesian MCMC-ABC and credibility estimates for the total predicted cumu- 
lative claims for each accident year i. We also present results for the sum of the total cumulative 
claims for each accident year, and the associated parameter uncertainty and process variance 
(see Section 4 for details). 

We can make the following conclusions from these results: 

1. The estimates of process variance for each d t j demonstrate that the frequentist bootstrap 
and the credibility estimates are very close for all accident years i. The Bayesian results 
compare favorably with the credibility results. 

2. The results for the parameter estimation error for the predicted cumulative claims C^j 
demonstrate for small i that the Bayesian approach results in a smaller estimation error 
compared to the frequentist approach. For large i, the Bayesian approach produces larger 
estimation error relative to the credibility approach. 

3. The total results for the process variance for C = Cj j demonstrate that the frequentist 
and credibility results are very close. Additionally, Bayesian total results are largest fol- 
lowed by credibility and then frequentist estimates which is in agreement with theoretical 
bounds. 

4. The total results for the parameter estimation error for C = J2i Q,J demonstrate that fre- 
quentist unconditional bootstrap procedure results in the lowest total error. The Bayesian 
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approach and credibility total parameter errors are close. Additionally, we note that the 
results in Table 7.1 of Wiithrich-Merz [30], for the total parameter estimation error under 
an unconditional frequentist bootstrap with unsealed residuals is also very close to the 
total obtained under the frequentist approach. 

8. Discussion 

This paper has presented a distribution-free claims reserving model under a Bayesian paradigm. 
A novel advanced MCMC-ABC algorithm was developed to obtain estimates from the resulting 
intractable posterior distribution of the chain ladder factors and chain ladder variances. We 
assessed several aspects of this algorithm, including the properties of the convergence of the 
MCMC algorithm as a function of the distance metric approximation in the ABC component. 
The methodologies performance was demonstrated on a synthetic data set generated from known 
parameters. Next, it was applied to a real claims reserving data set. The results we obtained 
for predicted cumulative ultimate claims were compared to those obtained via classical chain 
ladder methods and via credibility theory. This clearly demonstrated that the algorithm is 
working accurately and provides us not only with the ability to obtain point estimates for the 
first and second moments of the ultimate cumulative claims, but also with an accurate empirical 
approximation of the entire distribution of the ultimate claims. This is valuable for many 
reasons, including prediction of reserves which are not based on centrality measures such as the 
tail based VaR results we present. 
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A. ABC algorithm 



The ABC algorithm is typically justified in the simple rejection sampling framework. This then 
extends in a straightforward manner to other sampling frameworks such as the MCMC algorithm 
we utilise in this paper. We denote the posterior density from which we wish to draw samples 
by 7r (0\y) oc ir (y\9) tt (9) with 9 G U, where denotes support of the posterior distribution and 
y is the support for y. 

The ABC method aims to draw from this posterior density n (9\y) without the requirement of 
evaluating the computationally expensive or in our setting intractable likelihood 7r(y\9). The 
cost of avoiding this calculation is that we obtain an "approximation" . 



1st case. We assume that the support y is discrete. Given an observation y £ y, we would 
like to sample from ir(9\y). Then the original rejection sampling algorithm reads as follows: 

Rejection Sampling ABC 

1. Sample 6' from prior ir (9); 

2. Simulate synthetic data set of auxiliary variables x\9' ~ it (x\9'); 

3. ABC Rejection condition: if x = y then accept sample 9', else reject sample and return to 
step 1. 

Then the chosen 9' is distributed from ir(9\y). This follows from a simple rejection argument, 
Denote {x = y} if 9' was chosen. Then, the joint density of (9', x) conditional on {y, x = y} is 
given by 



MM,,* = = * W * (l|9,I{! ' }(a;> 



f Tr(9)ir{y\9)d9 
This implies that 



*-g$ = <0\y) ifx = y, (ai) 
otherwise. 



2>(Ml/»s = ?/) = *(%)■ (A.2) 

x&y 

Henceforth, this algorithm generates samples 0® ~ ir(9\y), for t = 1, . . . , T. 



2nd case. For more general supports y one replaces the strict equality x = y with a tolerance 
e > and a measure of discrepancy or a distance metric p(x, y) < e. In this case the poste- 
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rior distribution is given by ir(9, x\y, p(x, y) < e). Implementing this algorithm in a rejection 
sampling framework gives the following: 

Rejection Sampling ABC 

1. Sample 9' from prior ir (9); 

2. Simulate synthetic data set of auxiliary variables x\9' ~ tt (x\9'); 

3. ABC Rejection Condition 2: If p(x, y) < e then accept sample 9', else reject sample and 
return to step 1. 

In this case the joint density of (9',x), conditional on {y,p(x,y) < e}, is given by 

< e » - jXn":^^ - <a - 3) 

Note that for appropriate choices of the distance metric p and assuming the necessary continuity 
properties for the densities we obtain that 

lim / tt(9, x\y, p(x, y) < e)dx = 7r(9\y). (A. 4) 

This concept was taken further with the intention of improving the simulation efficiency by 
reducing the number of rejected samples. To achieve this, sufficient statistics were used to replace 
the comparison between the auxiliary variables ("synthetic data") x and the observations y. 
Denoting the sufficient statistics by S(y) and S(x), allows one to decompose the likelihood under 
the Fisher-Neyman factorization theorem into n(y\9) = f(y)g(S(y)\9) for appropriate functions 
/ and g. In the ABC context presented above, the consequence of this decomposition is that when 
p(S(y), S(x)) < e the obtained samples are from the posterior density n(9, x\y, p(S(y), S(x)) < e) 
similar to (A. 3). In general, summary statistics will be used when sufficient statistics are not 
attainable. 



B. MCMC-ABC to sample from tt abc (/,cr|X> 7 ) 

We develop an MCMC-ABC algorithm which has an adaptive proposal mechanism and anneal- 
ing of the tolerance during burn-in of the Markov chain. Having reached the final tolerance 
post annealing, denoted e mm ^ we utilise the remaining burn-in samples to tune the proposal 
distribution to ensure an acceptance probability between the range of 0.3 and 0.5 is achieved. 
The optimal acceptance probability when posterior parameters are i.i.d. Gaussian was proven to 
be at 0.234; see Roberts et al. [20]. Though our problem does not match the required conditions 
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for this proof, it provides a practical guide. To achieve this, we tune the coefficient of variation 
of the proposal, in our case it is the shape parameter of the Gamma proposal distribution. We 
impose an additional constraint that the minimum shape parameter value is set at 7™" 1 for 

je {!,..., 2J}. 



MCMC-ABC algorithm using bootstrap samples. 

1. For t = initialize the parameter vector randomly, this gives of^j = (/o°j_i, ■ 
Initialize the proposal shape parameters jj > 7™ in for all j G {1, . . . , 2 J}. 

2. For t = 1,...,T 

(a) Set (eVj) = (9?-?). 

(b) For j = l,...,2J 

i. Sample proposal 6* from a T (7^, ''^^-distribution. We denote the Gamma 
proposal density by K (^j'tlji^ lljj ■ This gives proposed parameter vector 

ii. Conditional on 6* = (of)j_ 1} 9j, of+wj) > generate synthetic bootstrap data set 
T>\ = V\ (0*) using the bootstrap procedure detailed in Section 4 where we 
replace the CL parameter estimates (f( CL \a( CL ^) by the parameters 0*. 

iii. Evaluate summary statistics S {T>r, 0, 1) and S (PJ; //*; s*) and corresponding de- 
cision function g(T>j\T>j) as described in Section 5. 

iv. Accept proposal with ABC acceptance probability 



\ (V,/.0') = -Mm, { 



> . 



That is, simulate U ~ U(0, 1) and set of = 9* if U < A (ofyj, e " 
v. If 100 < t < T\, and t t = e mm then check to see if tuning of the proposal is 
required. Define the average acceptance probability over the last 100 iterations 
of updates for parameter i by af 100 *- > and consider the adaption: 

0.3 and 7j > 7 j nin , 
> 0.5, 



Then set the proposal shape parameter as jj = max{7*, 7^ nm }. 





' 0.9 7j 


■ r _(t-100:t) 

if a\ 


H 


l.l 7j 


■ r -(t-100:t) 

if a\ 






otherwise. 
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The MCMC-ABC algorithm presented can be enhanced by utilising an idea of Gramacy et al. 
[13] in an ABC setting. This involves a combination of tempering the tolerance {et}t=i-.T an d 
importance sampling corrections. 

B.l. ABC algorithmic choices for the time series DFCL model 
We start with the choices of the ABC components. 

• Generation of a synthetic data set: Note that in this setting not only is the likelihood 
intractable but also the generation of a synthetic data set T>\ given the current parameter 
values F, S is not straightforward. The synthetic data set V*j is generated using the 
bootstrap procedure described in Section 4. Note that both the bootstrap residual Sjj and 
the bootstrap samples X>J are functions of the parameter choices; see Section 4.1. Therefore 
we generate for given F = f and S = a the bootstrap residuals = eij(fj-i,aj-i) 
and the bootstrap samples = D|(/,<r) according to the non-parametric bootstrap (see 
Section 4.1) where we replace the CL parameter estimates (f( CL \a( CL ^) by the parameters 



• Summary statistics: We introduce summary statistics to replace sufficient statistics 
when they are not attainable for a given model. Then, in order to define the decision 
function g, we introduce summary statistics; see Appendix A. For the observed data Vj 
we define the vector 



where n denotes the number of residuals eij. For given 6 = (F,S), we generate the 
bootstrap sample Vj = Vj(F,S) as described above. The corresponding residuals eij = 
Eij(Fj-i, should also be close to the standardized observations. Therefore, we define 

its empirical mean and standard deviation by 



9 = (F,S). 



S(V i; 0,l) = (5i,...,5 n+2 ) 



— (Co,i> ■ ■ ■ > Cq, j, Ci 5 i, . . . , C 0) j_i, . . . , CV-2,1, CV-2,2, Cj-i,i; 0, l) 




(B.l) 



1/2 



,5'* 




1 



■Vi,^-i)-^CF\s)) 2 



(B.2) 



n- 1 
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Hence, the summary statistics for the synthetic data is given by 

D yjiii 1 i s ) — ^o.ii • • • j u o,jj °i,i> • • • > °o,j-i' • • • > W-2,11 W-2,21 u 7-i,ij p 5 s ; • 

• Distance metrics: 

— Mahlanobis distance and scaled Euclidean distance 

Here we draw on the analysis of Sisson et al. [8] that proposes the use of the 
Mahlanobis distance metric given by 

p(S(Z>j;0,l),S(Z>J;M*,O) 

= [S(V I ;0,l)-S(V* I ;p*, S *)} T Z v ] [S (X> j; 0, 1) - S (V}; M *, s *)] , 

where the covariance matrix is an appropriate scaling described in Appendix 
C. The scaled Euclidean distance is obtained when we only consider the diagonal 
elements of the covariance matrix Sx>j. 

Note, the covariance matrix £x> 7 provides a weighting on each element of the vector of 
summary statistics to ensure they are scaled appropriately according to their influence 
on the ABC approximation. There are many other such weighting schemes one could 
conceive. 

— Manhattan "City Block" distance 
We consider the L 1 -distance given by 

ra+2 

p (S (Vj; 0, 1) , S (V}; p,*, s*)) = £ |# (V r , 0, 1) - S, (V}; M *, S *)| . 

i=i 

• Decision function: We work with a hard decision function given by 

g (Vj\V*j) =I{p(S (Vr, 0, 1) , S (V*j- p\ s*)) < e} . 

• Tolerance schedule: We use the sequence 

e t = max{20,000 - 10£,e min }. 

Note, the use of an MCMC-ABC algorithm can result in "sticking" of the chain for ex- 
tended periods. Therefore, one should carefully monitor convergence diagnostics of the 
resulting Markov chain for a given tolerance schedule. There is a trade-off between the 
length of the Markov chain required for samples approximately from the stationary dis- 
tribution and the bias introduced by non zero tolerance. In this paper we set e min via 
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preliminary analysis of the Markov chain sampler mixing rates for a transition kernel with 
coefficient of variation set to one. 

We note that in general, practitioners will have a required precision in posterior estimates 
that can be directly used to determine, for a given computational budget, a suitable 
tolerance e mm . 

• Convergence diagnostics: We stress that when using an MCMC-ABC algorithm, it is 
crucial to carefully monitor the convergence diagnostics of the Markov chain. This is more 
important in the ABC context than in the general MCMC context due to the possibility of 
extended rejections where the Markov chain can stick in a given state for long periods. This 
can be combatted in several ways which will be discussed once the algorithm is presented. 

The convergence diagnostics we consider are evaluated only on samples post annealing 
of the tolerance threshold and after an initial burn-in period once tolerance of e mm is 
reached. If the total chain has length T, the initial burn-in stage will correspond to the 
first Tj, samples and we define T = T — We denote by {of^} t=1 .f the Markov chain of 
the z-th parameter after burn-in. The diagnostics we consider are given by: 

— Autocorrelation. This convergence diagnostic will monitor serial correlation in the 
Markov chain. For given Markov chain samples for the i-ih parameter {0^} t _^.j,, we 
define the biased autocorrelation estimate at lag r by 

^(^) = .z, * , E - ? (ft)] i e f +T) - $ (*<)] . ( B - 3 ) 

{I -T)a (Vi) t=1 

where /x and a (6i) are the estimated mean and standard deviation of Q; L . 

— Geweke [10] time series diagnostic. For parameter d{ it is calculated as follows: 

1. Split the Markov chain samples into two sequences, {of^}t=i-.Ti and {^} t=T *.fi 
such that T* = T — T2 + 1, and with ratios T\/T and T2/T fixed such that 
(T 1 + T 2 )/f < 1 for all f . 

2. Evaluate (^J 1 ^) an d V- (^f 2 ) corresponding to the sample means on each sub 
sequence. 

3. Evaluate consistent spectral density estimates for each sub sequence, at frequency 
0, denoted SD(0; T\,6i) and SD(0; T^^i). The spectral density estimator consid- 
ered in this paper is the classical non-parametric periodogram or power spectral 
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density estimator. We use Welch's method with a Hanning window; for details 
see Appendix D. 
4. Evaluate convergence diagnostic given by 

T Tf 1 SD(O;Ti,0 i )+T 2 - 1 SD(0;T 2 A) ' 

According to the central limit theorem, as T — > oo one has that Zj, — > M(0, 1) if 
the sequence {of^} t=1 .f is stationary. 

— Gelman- Rubin [9] R-statistic diagnostic. This approach to convergence analysis re- 
quires that one runs multiple parallel independent Markov chains each starting at 
randomly selected initial starting points (we run five chains). For comparison pur- 
poses we split the total computational budget of T into T\ = T2 = . . . = T5 = ^ . 
The convergence diagnostic for parameter 6*j is calculated using the following steps: 



1. Generate five independent Markov chain sequences, producing the chains for 

^iht=i-.T k 



parameter 6i denoted {ofl}t=i:T k for k G {1, . . . , 5}. 



2. Calculate the sample means ft (oJ k ^J for each sequence and the overall mean 

3. Calculate the variance of the sequence means 



4. Calculate the within-sequence variances s 2 {oJ k ^j for each sequence. 

5. Calculate the average within-sequence variance, \ Yl\=i^ 2 (^P) ^~ ^i- 

6. Estimate the target posterior variance for parameter Oi by the weighted linear 
combination a 2 (of^ = + t^-B?- This estimate is unbiased for samples 
which are from the stationary distribution. In the case in which not all sub chains 
have reached stationarity, this overestimates the posterior variance for a finite T 
but asymptotically, T — > 00, it converges to the posterior variance. 

7. Improve on the Gaussian estimate of the target posterior given by 

N(j2 (of^j 1 & 2 (ff^j ) ^ accounting for sampling variability in the estimates of 
the posterior mean and variance. This can be achieved by making a Student-t 
approximation with location ju (^f^ji scale \fv% and degrees of freedom dfi, each 
given respectively by: 

Vi = a 2 (of \ + Si- and dfi = -H^L , where the variance is estimated as 
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Note, the covariance terms are estimated empirically using the within sequence 
estimates of the mean and variance obtained for each sequence. 



8. Calculate the convergence diagnostic \/r = y w j'jf i _ 2 \ > where as T — > oo one 



can prove that R — > 1. This convergence diagnostic monitors the scale factor by 
which the current distribution for 0j may be reduced if simulations are continued 
for T -> oo. 



C. Scaling of statistics in distance metrics 

In the Mahlanobis distance metric, estimation of the scaling weights is given by the covariance 
Tjx>j = Cov (S (Vj; ju, S)| £>/), where and s are the sample mean and standard deviation of n 
i.i.d. residuals e^j (see also (B.1)-(B.2)). Next we outline the estimation of £x> 7 by a matrix 

• Starting with the elements E^(/c, Z) with k, I G {1, . . . , n}, we obtain from the conditional 
resampling bootstrap 



Cov (C^., P 7 , /( CL ), = if i ^ i' or j / j' 

Var (c&| 2?,, flCL)~(CL^ = af^c..^ 



• Considering the elements k G {n + 1, n + 2}, I G {1, . . . , n} and also G {1, . . . , ra}, Z G 
{n + 1, n + 2} of the covariance matrix Sx> 7 , for simplicity we set S§^(/c, Z) = 0. 

• Considering elements k, I G {n + l,n + 2}, we assess now Cov(/1, 5) either analytically or 
numerically by simulation of appropriate i.i.d. residuals. 

Parametric Approximation 

— In approximating jl and s we assume i.i.d. samples £jj~/V (0, 1). 

- Using the assumptions we know that: 
VarQI) = i 
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Var(S) = ^ [(l + £ + E « =1 Var (g*)] = ^[2»(1 + £)]> 
Cov(M,S} = ^ TF [l-i]. 

- Under these assumptions: 

1. If the distribution of Eij is skewed then it is more appropriate to do a numerical 
approximation with the observed residuals from the bootstrap algorithm. 

2. The precision et from the MCMC-ABC algorithm should depend on the size of the 
claims triangle, that is, the number of residuals n. 

D. Estimating the Spectral Density 

This is calculated via a modified technique using Welch's method; see Proakis-Manolakis [23], 
910-913 . This involves performing the following steps: 

• Split each sequence {O^jt^-.T, and {0? ) } t=T ..r into L = 20 non-overlapping blocks of 
length N. 

• Apply a Hanning window function w(t) = 0.5 ^1 — cos (j^zi^ to the samples of the 
Markov chain in each block. 

• Take the discrete Fourier transform (DFT) of each windowed block given by @\(k) = 

• Estimate the spectral density (SD) as SD(wk) = \ Yld=o ®!(^)- 
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Year 





1 


2 


3 


4 


5 


6 


7 


8 


9 





248.97 


299.47 


357.00 


418.61 


473.63 


563.35 


693.22 


796.84 


914.95 


1,084.24 


1 


186.72 


201.99 


227.23 


271.18 


305.16 


379.37 


466.16 


554.30 


660.75 




2 


172.58 


207.48 


250.37 


304.44 


356.92 


417.60 


477.99 


542.25 






3 


195.19 


229.06 


290.83 


320.11 


367.60 


469.93 


543.40 








4 


131.00 


168.50 


198.18 


219.26 


270.00 


344.63 










5 


163.58 


181.16 


222.10 


246.78 


303.00 












6 


294.30 


373.08 


477.16 


566.20 














7 


529.31 


577.71 


805.95 
















8 


249.00 


321.83 


















9 


140.41 




















h 


1.2 


1.2 


1.2 


1.2 


1.2 


1.2 


1.2 


1.2 


1.2 


1.2 




1 


1 


1 


1 


1 


1 


1 


1 


1 


1 



Table 2: Synthetic Data - Cumulative claims C\j for each accident year i and development year j, i + j < I. 



Year 





1 


2 


3 


4 


5 


6 


7 


8 


9 





594.6975 


372.1236 


89.5717 


20.7760 


20.6704 


6.2124 


6.5813 


1.4850 


1.1130 


1.5813 


1 


634.6756 


324.6406 


72.3222 


15.1797 


6.7824 


3.6603 


5.2752 


1.1186 


1.1646 




2 


626.9090 


297.6223 


84.7053 


26.2768 


15.2703 


6.5444 


5.3545 


0.8924 




3 


586.3015 


268.3224 


72.2532 


19.0653 


13.2976 


8.8340 


4.3329 






4 


577.8885 


274.5229 


65.3894 


27.3395 


23.0288 


10.5224 








5 


618.4793 


282.8338 


57.2765 


24.4899 


10.4957 










6 


560.0184 


289.3207 


56.3114 


22.5517 












7 


528.8066 


244.0103 


52.8043 














8 


529.0793 


235.7936 
















9 


567.5568 



















Table 3: Real Data - Incremental claims Yij = dj — Cij-i for each accident year i and development year j, 
i + j <I. 
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DFCL model 


3=0 


3 = 1 


3 = 2 


J = 3 


3 =4 


j = 5 


3 = 6 


3 = 7 


3=S 


h 


1.20 


1.20 


1.20 


1.20 


1.20 


1.20 


1.20 


1.20 


1.20 


fiCL) 


1.20 (2.40E-2) 


1.22 (3.27E-2) 


1.16 (2.46E-2) 


1.17 (2.44E-2) 


1.23 (2.63E-2) 


1.19 (2.78E-2) 


1.16 (2.59E-2) 


1.17 (2.10E-2) 


1.19 (2.51E-2) 


7{MAP) 

fj Wo-.J-l 


1.07 (0.02) 


1.19 (0.02) 


1.05 (0.02) 


1.04 (0.02) 


1.10 (0.02) 


1.08 (0.02) 


0.97 (0.02) 


1.19 (0.03) 


1.14 (0.04) 


■TiMMSE) | 

fj Wo:J-l 


1.19 (1.34E-2) 


1.21 (1.38E-2) 


1.18 (1.27E-2) 


1.19 (1.30E-2) 


1.17 (1.37E-2) 


1.18 (1.53E-2) 


1.20 (1.60E-2) 


1.18 (1.73E-2) 


1.19 (2.35E-2) 


Vfj Wo-.J-l 


0.23 (4.00E-3) 


0.22 (3.1E-3) 


0.20 (3.1E-3) 


0.21 (3.2E-3) 


0.22 (3.9E-3) 


0.27 (1.01E-2) 


0.35 (1.24E-2) 


0.44 (1.41E-2) 


0.70 (1.60E-2) 


[90.05,90.95] \*?0:J-1 


[0.75,1.50] 


[0.77,1.50] 


[0.76,1.41] 


[0.75,1.44] 


[0.82,1.51] 


[0.78,1.52] 


[0.65,1.60] 


[0.46,1.79] 


[0.25,2.50] 


-XMAP) 

h 


1.15 (0.02) 


1.13 (0.02) 


1.06 (0.02) 


1.09 (0.02) 


1.15 (0.02) 


1.19 (0.02) 


1.12 (0.03) 


1.08 (0.03) 


1.06 (0.04) 


■XMMSE) 


1.19 (0.01) 


1.18 (0.01) 


1.17 (0.01) 


1.18 (0.01) 


1.16 (0.01) 


1.20 (0.02) 


1.18 (0.03) 


1.16 (0.02) 


1.20 (0.02) 




0.24 (5.1E-3) 


0.24 (4.4E-3) 


0.23 (5.0E-3) 


0.26 (5.8E-3) 


0.25 (5.6E-3) 


0.25 (5.7E-3) 


0.40 (0.01) 


0.49 (0.02) 


0.68 (0.02) 


[90.05,90.95] 


[0.66,1.48] 


[0.74,1.54] 


[0.67,1.42] 


[0.65,1.47] 


[0.74,1.50] 


[0.74,1.50] 


[0.22,1.54] 


[0.35,1.95] 


[0.1,2.50] 


Ave[A (9 v .2j,fj)] 


0.21 


0.21 


0.19 


0.22 


0.25 


0.21 


0.22 


0.20 


0.24 




1 


1 


1 


1 


1 


1 


1 


1 


1 


~2(CL) 
3 


1.02 (0.29) 


0.75 (1.44) 


0.51 (1.02) 


0.49 (0.91) 


0.71 (1.18) 


0.72 (1.89) 


0.25 (1.84) 


0.31 (1.40) 


0.25 (0.77) 


^2(MAP) 


0.58 (0.06) 


0.96 (0.06) 


0.54 (0.05) 


0.78 (0.05) 


0.78 (0.05) 


0.81 (0.04) 


0.61 (0.04) 


0.79 (0.04) 


0.56 (0.04) 


^(MMSE) 


1.11 (0.03) 


1.18 (0.03) 


1.14 (0.04) 


1.31 (0.03) 


1.29 (0.03) 


1.19 (0.02) 


1.16 (0.03) 


1.14 (0.03) 


1.05 (0.02) 




0.83 (0.02) 


0.79 (0.02) 


0.82 (0.02) 


0.80 (0.02) 


0.79 (0.02) 


0.72 (0.02) 


0.77 (0.02) 


0.78 (0.02) 


0.71 (0.02) 


[90.05,90.95] 


[0.33,2.89] 


[0.33,2.79] 


[0.25,2.91] 


[0.32,2.87] 


[0.33,2.82] 


[0.27,2.59] 


[0.21,2.66] 


[0.17,2.62] 


[0.22,2.42] 


Ave[A (01:2J,<Tj)] 


0.23 


0.24 


0.24 


0.23 


0.24 


0.24 


0.24 


0.24 


0.25 



Table 4: Comparison of Bayesian estimates for the chain ladder factors and variances versus classical estimates, in the case of synthetic data. Numerical standard 
errors in estimates are presented in brackets. 



Parameters 


Year 





1 


2 


3 


4 


5 


6 


7 


8 


9 




U i,J ~~ 


f(CL) 






























f(MMSE) 





























f (CL) 


1 




















10, 663 


318 


15, 126 


f(MMSE) 






















10, 663 


099 


14,907 


f(CL) 


2 


















10, 646, 884 


10, 662 


008 


26,257 


f(MMSE) 




















10, 646, 386 


10,661 


291 


25,541 


/(CI) 


3 
















9, 734, 574 


9, 744, 764 


9, 758, 


606 


34, 538 


f(MMSE) 


















9, 734, 765 


9, 744, 500 


9, 758, 


143 


34, 074 


f (CL) 


4 














9,837,277 


9, 847, 906 


9,858,214 


9,872, 


218 


85, 302 


f(MMSE) 
















9, 835, 850 


9, 846, 669 


9,856,516 


9, 870, 


315 


83, 400 


f(.CL) 


5 












10, 005, 044 


10,056,528 


10,067, 393 


10,077, 931 


10, 092 


247 


156, 494 


f(MMSE) 














10, 005, 302 


10,055,329 


10,066,390 


10, 076, 456 


10, 090 


563 


154,811 


f (CL) 


6 










9,419,776 


9, 485, 469 


9, 534, 279 


9, 544, 580 


9, 554, 571 


9,568, 


143 


286, 121 


f(MMSE) 












9, 400, 832 


9, 466, 638 


9,513,971 


9,524,436 


9,533,961 


9,547, 


308 


265, 286 




7 








8,445,057 


8, 570, 389 


8, 630, 159 


8, 674, 568 


8,683,940 


8, 693, 030 


8,705, 


378 


449, 167 


f(MMSE) 










8, 437, 023 


8,545,017 


8, 604, 832 


8, 647, 856 


8, 657, 369 


8,666,026 


8,678, 


159 


421,947 


f(CL) 


8 






8,243,496 


8,432,051 


8,557, 190 


8,616,868 


8,661,208 


8, 670, 566 


8, 679, 642 


8,691, 


971 


1,043,242 


^(M M SE) 








8,236,916 


8, 417, 305 


8,525,046 


8, 584, 722 


8, 627, 645 


8,637, 136 


8, 645, 773 


8,657, 


877 


1,009, 148 


f (CL) 


9 




8, 470, 989 


9,129,696 


9,338,521 


9,477, 113 


9, 543, 206 


9,592,313 


9, 602, 676 


9,612,728 


9,626, 


383 


3,950,814 








8, 467, 380 


9,118,521 


9,318, 217 


9, 437, 490 


9, 503, 553 


9,551,070 


9,561,577 


9,571,138 


9,584, 


538 


3,908,970 




f(CL) 


1.4925 


1.0778 


1.0229 


1.0148 


1.0070 


1.0051 


1.0011 


1.0010 


1.0014 






6, 047, 061 




3 


135.253 


33.803 


15.760 


19.847 


9.336 


2.001 


0.823 


0.219 


0.059 






5,918,083 




-XMMSE) 

'i 


1.4919 


1.0769 


1.0219 


1.0128 


1.0070 


1.0050 


1.0011 


1.0010 


1.0014 










^(MMSE) 


154.221 


33.000 


16.770 


22.397 


8.300 


2.166 


0.720 


0.158 


0.041 









Table 5: Predicted cumulative CL claims C^ L ^ for actual data and estimated CL reserves C^j i ' — Ci,j_i under the classical and Bayesian DFCL models. 



Accident Year i 


1 


2 


3 


4 


5 


6 


7 


8 


9 


Total 




192 


740 


2,668 


6,831 


30,474 


68,207 


80,071 


126,952 


389,768 


424,361 


(cf I M r 7) 112 

\ 1,1 — i i — i j 


503 


1,560 


3,059 


12,639 


25,761 


20,776 


33,771 


41,554 


108,547 


157,680 


fmscp^, re9 |7 , (di.jX) 


538 


1,727 


4,059 


14,367 


39,904 


71,301 


86,901 


133,580 


404,601 


452,708 


Vc 0l (%) 


3.61% 


6.76% 


11.91% 


17.02% 


25.61% 


25.00% 


19.38% 


12.81% 


9.93% 


7.49% 


( n pSaj/esX 1 / 2 


134 


533 


2,307 


7,185 


OT s iC7 

27,367 


74,235 


86,404 


i on no o 

129,038 


A o ,1 O O 

437,482 


A r?r\ Ann 

470,982 




224 


894 


1,801 


4,327 


15,819 


29,861 


32,243 


49,198 


152,879 


211,633 




261 


1,040 


2,927 


8,387 


31,610 


80,016 


o o oo A 

92,224 


1 oo nnn 

138,099 


463,425 


504,934 


V COi (yo J 


1 . / /O 


A C\1®7~ 
4.U / /o 


o.oy /o 


lu.Uo/o 


on a o0/i 


oU.lD/o 


01 efiO^ 


lo.Oo/o 


ll.OD/0 


O.OO /O 


VaR,fSr s (C, j - EfCi jYDAYDA 

0.95 \ *jJ L *)« 1 J J I J / 


554 


2,183 


5,632 


15,820 


61,122 


152,531 


173,665 


161,619 


816,701 


910,757 


VaR,f"« es (Ci / - E[d AVAYDi) 


726 


2,918 


7,430 


22,515 


79,472 


201,322 


228,448 


211,125 


1,278,665 


1,454,966 


(cv-iiyif ) 


192 


740 


2,668 


6,831 


30,474 


68,207 


80,071 


126,952 


389,769 


424,362 




188 


534 


1,493 


3,391 


13,515 


27,284 


29,674 


43,901 


129,764 


185,015 


(msep^ p/ ($,.,)) ^ 


269 


913 


3,057 


7,627 


33,337 


73,462 


85,392 


134,329 


410,802 


462,941 


V COl (%) 


1.81% 


3.58% 


8.97% 


9.04% 


21.40% 


25.77% 


19.04% 


12.88% 


10.40% 


7.82% 



Table 6: Comparison of the frequentist's bootstrap msep^ re9 , the Bayesian MCMC-ABC msep Bayes and the credibility msep cred . The coefficient of variation is as 
defined in Wuthrich-Merz [30] . 



Estimated ACF for Markov chain on parameter Fl CL ' for each distance metric 
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Tflag for ACF) 
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Figure 1: Estimated Autocorrelation Function (ACF) for parameters F and Sq. 



Estimated Z scores for posterior mean of F' GL| as a function of distance metric 



0) 2 

Cfl ! 
N 

■n 

aj 
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Estimated Z scores for posterior mean of 3 1 1 as a function of distance metric 




Length of Markov chain = (multiples of 5,000 samples) + T 



Figure 2: Estimated Z scores for the posterior mean of parameters F and as a function of the length 
of the Markov chain T. 



Estimated R statistic for CL factor f! CL) as a function of distance metric 



1.001 












0.997 








Scaled Euclidean 




0.996 




— Mahlanobis 




995 




City Block 





5 10 15 20 25 30 35 40 45 50 

Estimated R statistic for CL factor s^ CL) as a function of distance metric 




5 10 15 20 25 30 35 40 45 50 

Length of Markov chain = (multiples of 100 samples) + T . 



Figure 3: Estimated R statistic for parameters Fo and as a function of the length of the Markov chain 
f. 
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MCMC-ABC estimated distribution of chain ladder factors F. 
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MCMC-ABC estimated distribution of chain ladder standard deviations 
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Figure 4: Box- Whisker plots of parameters F and H with each box marking the 25*' 1 , 50*' 1 , 7h th percentiles. 
Top: 200,000 MCMC-ABC samples to estimate posterior for F. The sample mean and mode are denoted 
by '*' and 'o' respectively. The classical estimators f {CL) are denoted by A. Bottom: 200,000 MCMC- 
ABC samples to estimate posterior for S. The sample mean and mode are denoted by '*' and 'o' 
respectively. The classical estimators a^ CV) are denoted by 'A'. 



x 10 ? Empirical predictive distribution of ultimate claims via MCMC-ABC 




Accident year i 

Figure 5: Box- Whisker plots of predictive distribution of cumulative ultimate claims C\-j with the box 
marking the 25 t?i , 50 i; \ 75 t?i percentiles; see also Table 6. The mean predicted ultimate claims under 
a Bayesian approach (using MMSE point estimates) are marked with '*', the predicted mode for the 
ultimate claims (using MAP point estimates) is marked with 'o' and the mean predicted ultimate claims 
under the DFCL classical method are marked with 'A'. 
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*(f |D,,e) 
0.25 
0.2- 




Figure 6: Distribution of the chain ladder factor Fo as a function of tolerance. 
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